On the importance and incorporation of additional knowledge in cluster analysis

نویسنده

  • Jan Feyereisl
چکیده

Analysis of data without labels is commonly subject to scrutiny by unsupervised machine learning techniques. Although abundant expert knowledge exists in many areas where unlabelled data is examined, frequently such knowledge is not incorporated into automatic analysis. Semi-supervised learning allows for the incorporation of additional knowledge with the help of labels or constraints. However it is the field of supervised learning and the recently proposed advanced paradigm of learning using privileged information that provides an intriguing concept of incorporating special type of additional knowledge. In this thesis we explore the question of importance and incorporation of such additional knowledge within unsupervised learning. Our analysis is performed from four different viewpoints, namely anomaly detection, cluster interpretation, visualisation and identification. The functionality of signal fusion and low-level pattern recognition in the human immune system is our inspiration. A more practical set of immunology derived techniques is developed, allowing for the fusion of additional information for improved anomaly detection (UR-STD), cluster interpretation and visualisation (StOrM). The success of these techniques within computer security and process behaviour scenarios encouraged further exploration of additional knowledge incorporation at a more general level. Adoption of the advanced supervised learning paradigm for the unsupervised setting instigates investigation into the difference between privileged and technical data. By means of our proposed aRi-MAX method stability of the K-Means algorithm is improved and identification of the best clustering solution is achieved on an artificial dataset. Subsequently an information theoretic dot product based algorithm called P-Dot is proposed. This method has the ability to utilize a wide variety of clustering techniques, individually or in combination, while fusing privileged and technical data for improved clustering. Application of this P-Dot method to the task of digit recognition confirms our previous findings in a real-world scenario. Finally the nature of privileged information is investigated more formally and its association with the notion of data generators is revealed. Experiments using standard machine learning datasets uncover the hypothetical existence of disparate data generators, which can provide benefits for unsupervised learning when treated uniquely, i.e. equally. The incorporation of additional knowledge within cluster analysis can be beneficial, especially when generators are identified and treated separately and with regards to their nature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intellectual structure of knowledge in Nanomedicine field (2009 to 2018): A Co-Word ‎Analysis

Introduction: The Co-word analysis has the ability to identify the intellectual structure of knowledge ‎in a research domain and reveal its subsurface research aspects.‎ Objective: This study examines the intellectual structure of knowledge in the field of nanomedicine ‎during the period of 2009 to 2018 by using Co-word analysis.‎ Materials and Methods: This paper develops a sciento...

متن کامل

Incorporation of 12-Tungstophosphoric acid in Titania spheres and fabrication of core-shell Polyoxotungstate/Titania nanostructures

Core-shell 12-tungstophosphoric acid/TiO2 (HPW/TiO2) nanoparticles at 10 and 20% of HPW have been synthesized by simple in-situ sol- gel method. Characterization of the samples was carried out by X-ray diffraction (XRD), Scanning electron microscopy (SEM), Fourier transform infrared (FTIR). The X-ray diffraction patterns of as prepared solid samples, indicated characterist...

متن کامل

Incorporation of 12-Tungstophosphoric acid in Titania spheres and fabrication of core-shell Polyoxotungstate/Titania nanostructures

Core-shell 12-tungstophosphoric acid/TiO2 (HPW/TiO2) nanoparticles at 10 and 20% of HPW have been synthesized by simple in-situ sol- gel method. Characterization of the samples was carried out by X-ray diffraction (XRD), Scanning electron microscopy (SEM), Fourier transform infrared (FTIR). The X-ray diffraction patterns of as prepared solid samples, indicated characterist...

متن کامل

Abstract Concepts Through the Lens of Linguistic and Extra-Linguistic Knowledge

The paper deals with the rigorous methods used in the research of concepts representing abstract notions like “friendship”, “love”, “hatred”, “conscience”, and “envy”. Concepts of that kind have no visible physical support in the material world except for the sound forms of the words representing them, thus causing additional difficulties in classification, research and analysis as well as stip...

متن کامل

Interfirm Alliance Interactions and knowledge Learning: A Conceptual Research Model

Alliance raises many knowledge transfer and interfirm learning issues that have implications for how the alliance partners manage their cooperative learning activities in the alliance system. Many of these implications are grounded in the assumption that partners in the alliances have routines for transferring knowledge, learning, gaining management efficiencies. Thus organisations can support ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010